Online Grammar Compression for Frequent Pattern Discovery
نویسندگان
چکیده
Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω( 1 lgm ) to Ω( 1 lg∗N lgm ) where m is the length of an optimal pattern in a string of length N and lg∗ is the iteration of the logarithm base 2. For a sufficiently large N , lg∗N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.
منابع مشابه
GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series
The problem of frequent and anomalous patterns discovery in time series has received a lot of attention in the past decade. Addressing the common limitation of existing techniques, which require a pattern length to be known in advance, we recently proposed grammar-based algorithms for efficient discovery of variable length frequent and rare patterns. In this paper we present GrammarViz 2.0, an ...
متن کاملAn Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams
A data stream is a continuous, huge, fast changing, rapid, infinite sequence of data elements. The nature of streaming data makes it essential to use online algorithms which require only one scan over the data for knowledge discovery. In this paper, we propose a new single-pass algorithm, called DSMFI (Data Stream Mining for Frequent Itemsets), to mine all frequent itemsets over the entire hist...
متن کاملGrammar Compression: Grammatical Inference by Compression and Its Application to Real Data
A grammatical inference algorithm tries to find as a small grammar as possible representing a potentially infinite sequence of strings. Here, let us consider a simple restriction: the input is a finite sequence or it might be a singleton set. Then the restricted problem is called the grammar compression to find the smallest CFG generating just the input. In the last decade many researchers have...
متن کاملSequitur-based Inference and Analysis Framework for Malicious System Behavior
Targeted attacks on IT systems are a rising threat against the confidentiality of sensitive data and the availability of critical systems. With the emergence of Advanced Persistent Threats (APTs), it has become more important than ever to fully understand the particulars of such attacks. Grammar inference offers a powerful foundation for the automated extraction of behavioral patterns from sequ...
متن کاملThe Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016